Search CORE

5 research outputs found

ZASTOSOWANIE ALGORYTMU WYSZUKIWANIA WIELU WZORCÓW OPARTEGO O TECHNIKĘ Q-GRAMÓW DO WYSZUKIWANIA PRZYBLIŻONEGO

Author: Susik Robert
Publication venue: 'Index Copernicus'
Publication date: 01/01/2017
Field of study

We consider the application of multiple pattern matching (Multi AOSO on q-Grams) algorithm for approximate pattern matching. We propose the on-line approach which translates the problem from approximate pattern matching into a multiple pattern one (called partitioning into exact search). Presented solution allows relatively fast search multiple patterns in text with given k-differences(or mismatches). This paper presents comparison of solution based on MAG algorithm, and [4]. Experiments on DNA, English, Proteins and XML texts with up to k errors show that the new proposed algorithm achieves relatively good results in practical use.Rozważamy zastosowanie algorytmu wyszukiwania wielu wzorców (Multi AOSO on q-Grams) do wyszukiwania przybliżonego. Proponujemy rozwiązanie on-line, upraszczające problem wyszukiwania przybliżonego do wyszukiwania wielu wzorców. Zaprezentowane rozwiązanie umożliwia relatywnie szybko wyszukiwać wiele wzorców dla odległości Levenshteina (lub Hamminga) z ograniczeniem do k. W artykule porównane jest rozwiązanie oparte na algorytmie MAG oraz [4]. Badania eksperymentalne przeprowadzone na zbiorach DNA, English, Proteins and XML z różnymi wartościami k wykazały, że zaproponowany algorytm osiąga relatywnie dobre wyniki w praktycznym zastosowaniu

Biblioteka Nauki - repozytorium artykuÅÃ³w

Lublin University of Technology Journals

A Bloom filter based semi-index on $q$ -grams

Author: Grabowski Szymon
Raniszewski Marcin
Susik Robert
Publication venue
Publication date: 10/07/2015
Field of study

We present a simple

q

-gram based semi-index, which allows to look for a pattern typically only in a small fraction of text blocks. Several space-time tradeoffs are presented. Experiments on Pizza & Chili datasets show that our solution is up to three orders of magnitude faster than the Claude et al. \cite{CNPSTjda10} semi-index at a comparable space usage

arXiv.org e-Print Archive

Revisiting Multiple Pattern Matching

Author: Fredriksson Kimmo
Grabowski Szymon
Susik Robert
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/12/2019
Field of study

We consider the classical exact multiple string matching problem. The proposed solution is based on a combination of a few ideas: using q-grams instead of single characters, pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of various alternatives to achieve the possibly best combination of techniques. The main contribution of this paper are different alphabet mapping methods that allow to reduce memory requirements and use larger q-grams. The experimental results show that the presented algorithm is competitive in most practical cases. One of the tests shows also that tailoring our scheme to search over a byte-encoded text results in speedups in comparison to searching over a plain text

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Engineering the counting filter for string matching algorithms

Author: Grabowski Szymon
Susik Robert
Publication venue: Lodz University of Technology Press
Publication date: 01/01/2016
Field of study

We consider a new approach to the popular counting filter technique for approximate pattern matching. Our solution is based on q-grams combined with alphabet reduction by bin packing and using Streaming SIMD Extensions (SSE). We present a few variants that use the mentioned techniques and discuss pros and cons of them considering two approximate pattern matching problems. The first one is the well- known matching with k-differences and the second one is a biological problem of DNA sequence mutation called matching with inversions and translocations. The experimental results show the effectiveness of our ideas that speed up the counting filter and reduce the number of verifications by orders of magnitude

Lodz University of Technology Repository